-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addon: Add support for logging agent health checking #42
Conversation
Pull Request Test Coverage Report for Build 9404429060Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, currently MCOA supports having multiple signals enabled or disabled so in the ideal scenario the health prober should consider that all signals might not always be enabled (if possible).
61c997a
to
f65db04
Compare
internal/addon/var.go
Outdated
OtelcolName = "spoke-otelcol" | ||
OtelcolNS = "spoke-otelcol" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get this information from the ManagedClusterAddOn
object. The same for Logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of these variables did not do a good job of portraying what Name/Namespace are we referring to here. These are both the name & namespace of the resources that will be created on the spoke clusters. These don't change and are always the same unless we change the manifests. I've added tests so that if we do this the tests fail and we need to update these names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @JoaoBraveCoding, it seems that the health checking implementation in this PR is static - if a user modifies ClusterManagementAddOn
e.g. adds a new collector to a spoke or cluster set, the health of that collector is not reflected here. This use case was discussed as part of the proposal https://hackmd.io/aBUzPTEZRuCPp_kZqm4x2A?view
The addon should reflect the health of all collectors specified in the placements
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the health checking is indeed static for the moment because the resources MCOA currently generate are also static in their namespace/name (CLF is always called "mcoa-instance", as of this PR, and OTELCol is also called "mcoa-instance", we can change to a different name I'm not hard on that). If the user changes the OTELCol reference on ClusterManagementAddOn
, there is no problem because MCOA only cares about the Spec
of the resource.
In the hackmd indeed there is a future requirement that talks about deploying multiple instances of OTELCol (Deployment + Daemonset). However, nowadays this is not possible from the POV of the addon-framework. The addon-framework doesn't support 2 or more references of the same GVK. To change this, we will need to open a PR to the addon-framework to implement this functionality.
The same happens with the Health Probes, currently, they do not allow a dynamic list of ProbeFields/ResourceIdentifier
this is something that we will need to add support for in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an important requirement for us to support multiple collectors, I am not sure we can rely on getting this change into the addon framework.
@iblancasa could you please chime in here and describe your idea with implementing this in a similar way how secrets are handled by the MCOA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iblancasa could you please chime in here and describe your idea with implementing this in a similar way how secrets are handled by the MCOA?
The problem we had was related to reconciliation, for instance. I was planning to follow the same approach we currently have for secrets and implement reconciliation and everything else if, for whatever reason this could not be implemented as part of the addon framework. But that will need further discussion and I don't think this thread is the best place to look for that solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay FWIW the ACM ticket we now they are interested for external contributions is ACM-11509. IMHO we will get either way into a state that our addon will support multiple GVKs. However we still have time till 2.12 to try contributing to the addon-framework first and then fall back if needed to custom code. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! Looks promising 👀
abfc92b
to
26c9fb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM main open question is how this works with custom placements in the ClusterManagementAddon
on the hub.
internal/addon/var.go
Outdated
ClusterLogForwardersResource = "clusterlogforwarders" | ||
SpokeCLFName = "instance" | ||
SpokeCLFNamespace = "openshift-logging" | ||
clfProbeKey = "isReady" | ||
clfProbePath = ".status.conditions[?(@.type==\"Ready\")].status" | ||
|
||
OpenTelemetryCollectorsResource = "opentelemetrycollectors" | ||
SpokeOTELColName = "spoke-otelcol" | ||
SpokeOTELColNamespace = "spoke-otelcol" | ||
otelColProbeKey = "replicas" | ||
otelColProbePath = ".spec.replicas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we replicate the resource names here? Doesn't loggingv1
provide one for clusterlogforwarders
? Same question for opentelemetrycollectors
@JoaoBraveCoding @iblancasa Do we want to keep deploying instances of opentelemetrycollector
on the spokes with the name spoke-otelcol
? WDYT start referring them with a prefix like mcoa-...
? Same for clusterlogforwarders
@JoaoBraveCoding Can the health prober work with CLFs/OtelCollectors named differently per clusterset? For example one creates a CLF resource called mcoa-eu-clusters-instance
for a custom placement in the ClusterManagementAddon
. How does probing work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoaoBraveCoding @iblancasa Do we want to keep deploying instances of opentelemetrycollector on the spokes with the name spoke-otelcol? WDYT start referring them with a prefix like mcoa-...? Same for clusterlogforwarders
Yes, it makes sense.
But, anyway, these values should be get from the AddonConfig not hardcoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we replicate the resource names here? Doesn't loggingv1 provide one for clusterlogforwarders? Same question for opentelemetrycollectors
AFAIK unless I'm missing something I didn't find for both, a constant in their respective packages that gives me the plural of the resource.
@JoaoBraveCoding @iblancasa Do we want to keep deploying instances of opentelemetrycollector on the spokes with the name spoke-otelcol? WDYT start referring them with a prefix like mcoa-...? Same for clusterlogforwarders
I'm fine with calling them both mcoa-instance
or anything else. @iblancasa At least for now, I don't see a use case where we would need to make these names configurable using AddonConfig
.
@JoaoBraveCoding Can the health prober work with CLFs/OtelCollectors named differently per clusterset? For example one creates a CLF resource called mcoa-eu-clusters-instance for a custom placement in the ClusterManagementAddon. How does probing work here?
Yes it can. Because in the end once the CLFs/OtelCollectors are rendered into the manifestworks they all will have the same name (nowadays CLF is instance
and OtelCollectors spoke-otelcol
) so unless we want to make these names customizable it will work. If, for any reason, we want to make these names change depending on, for instance, placement name then this will not work and we will need improvements to the framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with calling them both
mcoa-instance
or anything else. @iblancasa At least for now, I don't see a use case where we would need to make these names configurable usingAddonConfig
.
I would prefer to keep it done from the AddonConfig
since it's the "source of truth". Also... we will support, at least for the OpenTelemetryCollector
, the creation of multiple instances. And those will be provided by users.
internal/addon/var.go
Outdated
ClusterLogForwardersResource = "clusterlogforwarders" | ||
SpokeCLFName = "instance" | ||
SpokeCLFNamespace = "openshift-logging" | ||
clfProbeKey = "isReady" | ||
clfProbePath = ".status.conditions[?(@.type==\"Ready\")].status" | ||
|
||
OpenTelemetryCollectorsResource = "opentelemetrycollectors" | ||
SpokeOTELColName = "spoke-otelcol" | ||
SpokeOTELColNamespace = "spoke-otelcol" | ||
otelColProbeKey = "replicas" | ||
otelColProbePath = ".spec.replicas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoaoBraveCoding @iblancasa Do we want to keep deploying instances of opentelemetrycollector on the spokes with the name spoke-otelcol? WDYT start referring them with a prefix like mcoa-...? Same for clusterlogforwarders
Yes, it makes sense.
But, anyway, these values should be get from the AddonConfig not hardcoded.
Co-authored-by: Joao Marcal <joao.marcal12@gmail.com>
Co-authored-by: Joao Marcal <joao.marcal12@gmail.com>
Co-authored-by: Joao Marcal <joao.marcal12@gmail.com>
Co-authored-by: Joao Marcal <joao.marcal12@gmail.com>
36a164a
to
ed25381
Compare
ed25381
to
69a4f22
Compare
69a4f22
to
6a24ada
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding as my latest changes introduced a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, issue fixed
I'm merging this one since we have approval from the two teams. |
With this PR, the addon agent will perform health checks on the spoke-cluster's ClusterLogForwarder and OTEL collector instances:
Ready
is "True".spec.replicas
field is 0, it will reportAVAILABLE = False
to the addon.Notes to reviewers:
ManagedClusterAddon
resource;ManifestWork
;Alternatives:
ManifestWork
were created on the spoke cluster.